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Abstract 

Background: We recently showed improved between-subject variability in our [ 18 F]fluorodeoxyglucose positron 
emission tomography (FDG-PET) experiments using a Michaelis-Menten transport model to calculate the metabolic 
tumor glucose uptake rate extrapolated to the hypothetical condition of glucose saturation: MRg™* = K; * 
(Km + [glc]), where K, is the image-derived FDG uptake rate constant, K M is the half-saturation Michaelis constant, 
and [glc] is the blood glucose concentration. Compared to measurements of alone, or calculations of the 
scan-time metabolic glucose uptake rate (MR g | uc = /C, * [glc]) or the glucose-normalized uptake rate (MR g | uc = /C*[glc]/ 
(100 mg/dL), we suggested that MR^ u a * could offer increased statistical power in treatment studies; here, we 
confirm this in theory and practice. 

Methods: We compared K- tl MR g | uc (both with and without glucose normalization), and MR g ™ x as FDG-PET 
measures of treatment-induced changes in tumor glucose uptake independent of any systemic changes in blood 
glucose caused either by natural variation or by side effects of drug action. Data from three xenograft models with 
independent evidence of altered tumor cell glucose uptake were studied and generalized with statistical 
simulations and mathematical derivations. To obtain representative simulation parameters, we studied the 
distributions of K-, from FDG-PET scans and blood [glucose] values in 66 cohorts of mice (665 individual mice). 
Treatment effects were simulated by varying MR^ and back-calculating the mean K, under the Michaelis-Menten 
model with K M = 130 mg/dL. This was repeated to represent cases of low, average, and high variability in K\ (at a 
given glucose level) observed among the 66 PET cohorts. 

Results: There was excellent agreement between derivations, simulations, and experiments. Even modestly 
different (20%) blood glucose levels caused and especially MR g | uc to become unreliable through false positive 
results while MR^ remained unbiased. The greatest benefit occurred when measurements (at a given glucose 
level) had low variability. Even when the power benefit was negligible, the use of MRg™* carried no statistical 
penalty. Congruent with theory and simulations, MR^ 3 * showed in our experiments an average 21% statistical 
power improvement with respect to MR g | uc and 10% with respect to (approximately 20% savings in sample size). 
The results were robust in the face of imprecise blood glucose measurements and K M values. 

Conclusions: When evaluating the direct effects of treatment on tumor tissue with FDG-PET, employing a 
Michaelis-Menten glucose correction factor gives the most statistically powerful results. The well-known alternative 
'correction', multiplying by blood glucose (or normalized blood glucose), appears to be counter-productive in this 
setting and should be avoided. 

Keywords: FDG-PET, Glucose correction, Michaelis-Menten, Response to treatment, Glucose bias 



* Correspondence: williams.simoniagene.com 

'Department of Biomedical Imaging, Genentech, Inc., 1 DNA Way, South San 
Francisco, CA, 94080, USA 

Full list of author information is available at the end of the article 

jt YN 9 © 2012 Williams et al.; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons 

m£t \ ^T^T*1 T"1 O'^T* Attribution License (http://creativecommons.Org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction 
.L in any medium, provided the original work is properly cited. 



Williams ef al. EJNMMI Research 2012, 2:35 
http://www.ejnmmires.eom/content/2/1/35 



Page 2 of 1 1 



Background 

Quantitative [ 18 F]fluorodeoxyglucose positron emission 
tomography (FDG-PET) is increasingly relied upon to 
measure pharmacodynamic responses in controlled 
trials, bringing a greater need for accurate and reprodu- 
cible scans to minimize the number of subjects needed 
for a successful trial. Glucose levels have long been 
recognized as a factor modulating FDG uptake [1-8]; but 
even so, there has been some debate regarding how best 
to compensate for changing glucose levels when com- 
paring scans. Some investigators have eschewed glucose 
corrections altogether after observing increased rather 
than decreased statistical noise in 'corrected' PET mea- 
surements, attributing this, perhaps, to error in the glu- 
cose measurement itself [9,10]. However, avoiding 
glucose correction poses a conundrum of interpretation 
when a treatment may induce a systematic change in 
blood glucose levels. Such treatments are known, and 
FDG-PET may be used to assess their impact; they in- 
clude some potentially important new drugs still under 
clinical investigation, such as certain Akt and PI3K inhi- 
bitors [11,12]. 

The seminal work of Sokoloff et al. [13] described the 
Michaelis-Menten kinetics of glucose and tracer trans- 
port and showed how the radioactive tracer uptake rate 
constant (7<i) could be used to estimate the tissue glu- 
cose uptake in physiological units, i.e., the metabolic rate 
of glucose (MRgi uc = A"i*[glc]/LC umol glucose per 100 g 
tissue per min). Under steady-state conditions, the half- 
saturation Michaelis constants (/<" M ) and the maximal 
velocities (V max ) for tracer and glucose are factored into 
the lumped constant (LC) which summarizes the differ- 
ential properties of tracer and glucose. Scans obtained 
under different blood glucose levels will almost inevit- 
ably indicate different metabolic rates of glucose, and 
one must decide how to detect changes in tumor glucose 
metabolism that are not merely due to changes in blood 
glucose. 

We recently demonstrated [14] that in untreated ani- 
mals, both tumor /<"; values and MR g i uc values were, 
on the average, strongly correlated with blood glucose, 
showing that an appropriate form of blood glucose 
correction might facilitate the identification of treat- 
ment effects under changing glucose conditions. We 
sought to understand this glucose effect so that an ap- 
propriate compensating correction could be made, 
expecting that this would improve the power to detect 
treatment effects. 

The Michaelis-Menten relationship between glucose 
concentration and transport [13-19] was used as the 
basis of the proposed correction. With it, we showed 
that, on the average, there was less variability in un- 
treated animals when estimating the hypothetical 
glucose-saturated limit to the tumor metabolic rate of 



glucose ( MR^jJ^J rather than the tracer rate constant 

(/<;) or the actual scan-time metabolic rate of glucose 
(MR gluc ). MR^ is the asymptotic limit to the plot of 
uptake rate versus [glucose]. K M is a half-saturation 
Michaelis constant such that MR^ = K t * (K M + \glc\). 
To demonstrate a true drug-induced treatment 
effect on glucose uptake in the tumor tissue independ- 
ent of any changes in blood glucose (see Table 1 and 
Additional files 1 and Additional file 2), we selected 
dynamic FDG-PET scans from 60 mice treated with 
inhibitors of the cell-signaling MEK and RAF tyrosine 
kinases [20,21]. These have previously been reported 
as modulating FDG-PET in preclinical and clinical set- 
tings [22-24], and we have observed drug-induced 
reductions in FDG uptake both in solid tumors and in 
cell culture. A plausible mechanism for this reduction 
was demonstrated through GLUT-1 immunofluores- 
cence. We analyzed data before and after 7 days of 
treatment, a compromise between early read-out and 
being certain that the treatment had had time to take 
effect. 

Because limited experimental studies alone were inad- 
equate to explore with any certainty the power relation- 
ships in (relatively noisy) FDG-PET data, we have 
supplemented these experiments with statistical simula- 
tions and with analytical derivations that are presented 
in Additional file 3. 

Methods 

The experimental setting 

Our laboratory experiments employed dynamic FDG- 
PET to measure the tumor uptake rate constant for 
FDG, K v as a function of tumor treatment with tyrosine 
kinase inhibitor drugs. The experiments contained two 
or more groups of animals: one control group adminis- 
tered vehicle alone, and at least one treatment group 
administered an active drug in the same dosing vehicle. 
We analyzed data before and after 7 days of treatment, 
expecting that there would be no difference between the 
groups before treatment and that some treatment effect 
would be evident after 7 days. We compared Ki with two 
alternative PET metrics that account for blood glucose 
in some way, MR gluc and MR^ u a *, to study the relative 



Table 1 Treatment studies, cell lines, and drug 
substances 



Study 


Cell line 


Tissue Type 


Drug substance 


Mice 


A 


A375 


Melanoma 


GDC-0879 (BRAF) 


18 


B 


A2058 


Melanoma 


G-00033054 (MEK) 


18 


C 


HCT116 


Colorectal 


GDC-0973 (MEK) 


24 








TOTAL 


60 
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merits of each metric at detecting a true tumor treat- 
ment effect as seen in the two-sample two-sided f-test. 
This is also the scenario the simulations (below) and 
power calculations (Additional file 3) are designed to 
represent. 

False positives 

We considered that a true treatment effect altering tumor 
glucose uptake was one based on a physiological change 
in the tumor tissue per se. Thus, for our purposes, 
changes in tumor glucose uptake caused merely by altera- 
tions in blood glucose were not true treatment effects but 
fall into our definition of false positive results. 

Laboratory experiments 
Animal handling and imaging 

Experimental details were as described previously 
[14,25]. All animals were fasted overnight with access 
to water ad libitum. Mice were induced and maintained 
under light anesthesia using isoflurane in air (GDC- 
0879 study) or sevoflurane in air (G00033054 and 
GDC-0973 studies). Body temperature was maintained 
at 37°C with warm air flows while the eyes were pro- 
tected from dehydration with ophthalmic ointment. All 
studies were conducted under the approval of Genente- 
ch's AAALAC-accredited Institutional Animal Care and 
Use Committee. All animals underwent 30-min dy- 
namic FDG-PET scans with X-ray computed tomog- 
raphy (CT)-based attenuation correction just prior to 
starting their treatment regimen. FDG doses were 
infused via the lateral tail vein over a 1-min period in a 
volume of 100 uL. 

Blood glucose measurements 

At every scan, blood glucose measurements were taken 
twice: once approximately 5 min before and once shortly 
after the PET/CT scan approximately 35 min later. The 
glucose value used in subsequent calculations is the 
mean of the pre- and post-scan measurements. Data 
were collected with the commercially available Contour 
glucometer (Bayer Healthcare, Tarrytown, NY, USA) 
using blood freshly obtained by pricking the saphenous 
vein. Test-retest reproducibility measurements using this 
instrument in our hands showed a coefficient of vari- 
ation of 3.7% [14]. 

Prior use of the experimental data 

The 665 mice in 66 studies (Table 2) used here to inform 
the simulation parameters are mostly the same as those 
585 mice described in our analysis of variability [14], 
refined slightly by adding in data from newly available 
cohorts of A375, HCT116, and MEL-537 mice and re- 
moving a small number of animals for which post- 
treatment scans were unavailable (H596, A2058). 



Table 2 Animal models and number of mice 



Model 


Cell line/strain 


Number 
of cohorts 




Number 
of mice 








Control Treatment 


Control 


Treated 


1 


BT474 in SCI D 


2 


2 


22 


22 




Nude Beige 










2 


\-lf — T1 i f; in M, i/M, 
HL 1 1 1 D in NU/NU 


5 


8 


54 


86 


3 


rL3 in NU/NU 


2 


2 


24 


24 


A 

7 ! 


rauu in ld i / bLiu 


1 


1 


U 


1 U 


r 
J 


lizyz In Ld I / bLIU 


1 


1 


1 n 


IU 


6 


UlCn^ in niil_lf~C 

nr>yo in nunbr 
transgenic 


1 


3 


I I 


33 


/ 


537-Mel in Nu/Nu 


2 


■\ 


17 


31 


8 


A2058 in Nu/Nu 


4 


10 


39 


99 


9 


A375 in Nu/Nu 


4 


/ 


35 


64 


10 


Colo205 in Nu/Nu 


1 


1 


12 


12 


11 


H2122 in Nu/Nu 


1 


3 


10 


30 




Subtotal 


24 


42 


244 


421 




Total 


66 




665 



Tumor treatment models with established drug effects on 
tumor glucose uptake 

Table 1 describes the subset of studies from Table 2 in 
which there was additional non-imaging evidence of a 
true treatment effect on tumor glucose uptake independ- 
ent of blood glucose levels. Athymic nude mice were 
implanted in the right flank with a Matrigel/Hanks 
Balanced Salts medium containing 10 million melanoma 
(A375, A2058) or 5 million colorectal (HCT116) cancer 
cells. Tumors reached a group median volume of at least 
250 mm prior to beginning the study. The blood glu- 
cose and FDG-PET data (7<i, MR gluc , MR^) are pre- 
sented for these studies in Additional file 1. Cell culture 
experiments were used to show direct drug effects on 
FDG uptake, and immunofluorescence was used to show 
an apparent loss of GLUT-1 at the cell membrane both 
in cells and tumor tissue (see Additional file 2 for 
descriptions of and results for those experiments). 

Statistical power in experimental data: p-values as a 
function of sample size 

Two-sample two-sided t-test j?-values were calculated in 
these three true treatment studies: A, B, and C described 
in Table 1. This was repeated using, MR 1 ^*, MR g i uc , and 
K y We examined the /7-values at baseline, where the null 
hypothesis should be accepted, and on treatment at day 
7, where the null hypothesis should indeed be rejected 
based on our knowledge of drug action on tumor cell 
and tissue glucose handling (Additional file 2). 

A preliminary analysis confirmed that our A375 
(« = 9), A2058 (« = 9), and HCT116 (« = 12 per group) 
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tumor studies were powered with sufficient numbers of 
animals to detect large treatment effect sizes using any 
FDG-PET metric: K v MR gluc , or MR^. To examine 
how studies with less power might perform, we under- 
took the simulations described below and supplemented 
those with a meta-analysis of smaller groups obtained by 
sampling within our experimental data. We considered 
the full cohort of animals prepared for a given study to 
be the 'universe' of animals from which the smaller 
groups were drawn randomly using sampling without re- 
placement. We calculated results (presented in Figure 1) 
for every possible combination of individuals as long as 
the number of combinations totaled less than 4,000; 



when more combinations were possible, we randomly 
sampled 4,000 cases to generate our results. 

False positive rates in experimental data: relation to 
sample size 

Mice were randomized into nominal control and treat- 
ment groups, each containing n = 6 to 12 mice (Table 2), 
allowing 42 comparisons of two-sample two-sided i-tests 
to be performed on FDG-PET data collected before any 
treatment was administered. At this timepoint, a sta- 
tistically significant result was considered to represent 
a false positive result. A particular study was flagged 
as having a high rate of false positives whenever the 



A 1 

0.1 
a 0.05 

3 

a o.oi 
i 

a. 

0.001 
1e-04 



B 



0.1 
a 0.05 

3 

5 0.01 
I 

a. 

0.001 



1e-04 



MR B |: 

MFC 



HCT116 colon cancer treated with MEK-lnhibitor (GDC-0973) 1 | - 1 - 
1 1 1 1 1 1 1 1 

5 6 7 8 9 10 11 12 

Sample size 



ft K ■ - 



MR* 
MR™ 



A2058 melanoma treated with MEK-lnhibitor (G00033054) 



n 1 1 1 r~ 

5 6 7 8 9 

Sample size 



C 1 -r 



0.1 
„ 0.05 

3 

g 0.01 
I 

Q. 

0.001 



1e-04 -L 



MRg|, 

MFC 



*1 — 



A375 melanoma treated with BRAF-lnhibitor (GDC-0879) 



"1 1 1 1 1 

5 6 7 8 9 

Sample size 



Figure 1 Experimental statistical power at day 7 post-dose. Three panels correspond to three animal models from Table 2. Each shows 
Student's t-test results from treatment comparisons of control and treatment groups of mice as a function of sample size and using three PET 
metrics. (A) HCT116 colorectal cancer in Nu/Nu mice. (B) A2058 melanoma cancer in Nu/Nu mice; (C) A375 melanoma cancer in Nu/Nu mice. 
Results were calculated for the full group size of n animals and for all possible combinations of individuals (limited to a maximum of 4,000 
random samples) studied in four progressively smaller subsets (x-axis). The y-axis (log 10 scale) indicates the significance level p-value. The purple 
dashed line indicates a significance level of 0.05. Every boxplot includes a bold horizontal line that indicates the median p-value. The box length 
shows the interquartile range (25% to 75%), and the whiskers show minimum and maximum observed p-values. 
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f-tests rejected the null hypothesis (p<0.05) more 
often than the theoretical false positive rate (a) of 5%, 
measured across all the combinations of individuals 
tested. Meta-analysis of progressively smaller subsets 
as described above was used to assess how the false 
positive error rate would behave in smaller, less 
powerful, studies. This was repeated using MR^* , 
MRgi uc , and K. 

Pharmaceuticals 

GDC-0879 is a B-RAF [20] selective kinase inhibitor 
[26,27] that has been demonstrated to be effective 
against cancers carrying the V600 mutation [28]. MEK is 
one of the three enzymes of the mitogen-activated pro- 
tein kinase (MAPK) cascade involved with RAS/RAF sig- 
naling [21]. G00033054 and GDC-0973 are potent and 
selective MEK inhibitors that have been efficacious in 
treating KRAS and RAF mutant cells [29]. 

All drug substances were dosed daily in 100 uL of ex- 
cipient. GDC-0879, GDC-0973, and G00033054 were 
dosed for 7 days at 100 mg/kg, 10 mg/kg, and 25 mg/kg, 
respectively. All animals were dosed through oral gavage 
(per os). Control groups were subjected to the same regi- 
men but received no active drug in their dosing solution. 

Derivations, statistics, and simulations 

We studied the properties of the two-sample two-sided 
f-test comparing sample means of K[ and MR^ 1 be- 
tween control and treatment groups, respectively, in 
analytical derivations (presented as Additional file 3) and 
in simulations which are described below. Data were 
simulated assuming either no treatment effect or assum- 
ing a treatment effect of 10% to 50% change in the 
glucose-saturated limit to the tumor glucose uptake rate, 
MRSS specified in each simulation. As a function of 
the involved parameters, our study evaluated the test 
statistics under both the null and alternative hypotheses 
by estimation of false positives (including significant test 
results caused merely by changes in blood glucose) and 
the power to detect true differences in the tumor glu- 
cose uptake rate limit. Simulations were run in the stat- 
istical programming language R [30]. 

We assumed that the relationship between the FDG 
rate constant K, and glucose [gle] followed the 
Michaelis-Menten (MM) form [14-19] and that obser- 
vations of the rate constant were corrupted by noise. 
That is, the observed rate constant was given by 
Ki = MR g ™/(^ M + [gle]) + e, where c is the zero-mean 
Gaussian with variance of, here denoted as c ~ AnO, of). 

Let K l ,K i represent the sample average FDG uptake 
rates across n observations in the control and treatment 

groups, respectively, and let MR"^*' C and MR"^ ,T be 



the sample averages of the quantity K * (K M + [gle]) in 
the two groups. Under these assumptions, we compared 

the statistical properties of the i-test comparing K i and 
Rj with the i-test comparing MR g ^' C and MR^' T . 

The analytical derivation of the power functions re- 
lating to K and MR^f follows standard develop- 
ments based on the Gaussian distribution [31] and is 
presented for the interested reader in Additional file 3. 
To illustrate the validity of the derivation and to de- 
lineate when MR g ™ provides significantly improved 
statistical properties vis-a-vis K v we simulated obser- 
vations from the joint process [K v [gle]) as follows. 

Given the parameters |mR^,.Km,^, a 2 g , of j, a single 
draw of (K- v [gle]) was obtained by first sampling 
[gle] ~ N(p g , o*\ and c ~ N(0,oj), and then by evaluat- 
ing Ki = MR g 7 u a c 7(/<M + [gle]) + c. For each simulation 
iteration, the preceding was repeated n times each in 
the control and treatment groups, respectively, and two- 
sided i-tests were used to test for equality of means at 
a = 0.05 level of significance. A total of 4,000 simulation 
iterations were used in each setting. 

To get representative simulations, we chose parameter 
values based on output from fitting the MM model to 
FDG-PET data from each of the 66 (as-yet-untreated) 
experimental cohorts of mice described in Table 2. For 
these studies, with the half-rate Michaelis constant set at 
Km = 130 mg/dL [14], the scatter plot in Figure 2 shows 
estimates of MR^ versus a e . For MR^f, the sample 
mean and standard deviation were 47.9 and 12.7, re- 
spectively (range = 31.0 to 92.0), and for o e , they were 
0.048 and 0.018, respectively (range = 0.022 to 0.113). 
Based on these values, the first simulation setting ('S1J 
noted on the face of Figure 2) represents an 'average' 
case with MR^J 1 * and o e set at their sample mean values 
of 48 and 0.048. The second ('S2') and third ('S3') set- 
tings (likewise noted on the face of Figure 2) represent 
cases with strong and weak signal-to-noise ratios, where 
MR g 7 u ^ and a e are set to (55, 0.028) and (38, 0.057), re- 
spectively. In each simulation, glucose was sampled 
according to [gle] ~ N(90, 25 2 ), the approximate marginal 
distribution of glucose across the sample data, and K M 
remained fixed at 130 mg/dL. 

For simulations under the null hypothesis, the max- 
imal uptake rate MRg] 1 ^ was set the same in the control 
and treatment groups, and we evaluated the effect on 
the false positive rate (i.e., concluding that there is a 
treatment effect when in fact there is none) caused 
merely by a change in mean blood glucose. Mean blood 
glucose changes of 10%, 20%, and 30% were assessed. 

Simulations under the alternative hypothesis compared 
the power of the t-tests to detect treatment effects (S) 
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Estimates of Stand. Dev. (e) vs. MR" 



E a 

tr o 

"> CD 




"i r 

0.02 0.04 0.06 0.08 

std.dev(e) 



0.10 



Figure 2 Estimates of MR™£* and standard deviation (e) in the 66 studies described in Table 2. Illustrative cases discussed in the text are 
marked as SI, S2, and S3. 



corresponding to an approximate 20% to 30% reduction 
in the tumor glucose uptake rate limit while 
keeping the glucose distribution the same. Sample sizes 
were chosen between n = 6 and n = 12. 

The robustness of MR^* to errors in [glucose] and 
K M was also investigated by simulations. For errors in 
the measurement of blood glucose, we replaced the 
quantity K (A" M +[glc]) by K, (K M + [gle]*), where 
[glc]* = [gle] +N(0, 4 2 ). That is, the K^ values were gener- 
ated using the correct (uncorrupted) glucose values [gle], 
while MR^jJf was estimated using observed (corrupted) 
glucose [gle]*. A similar process of substitution was used 
with Km, using scenarios (Km = 100 mg/dL, Km* = 130 mg/ 
dL) and (K M = 130 mg/dL, K M * = 100 mg/dL). 

Results and discussion 

Results 

Statistical and blood glucose-induced false-positive error 
rates 

In the absence of any glucose bias between the control 
and treatment groups, the i-tests based on K u MR^J 1 ", 
and MR gluc all have simulated false positive rates which 
are consistent with the nominal statistical type I false 
positive error rate of a = 0.05. However, as seen in 
Table 3, for the first simulation setting with n-Yl 



observations per group, only the test based on MRJ" 
preserves the correct false positive error rate in the pres- 
ence of a glucose bias, while the tests based on K and 
MRgi uc both perform increasingly poorly as the magni- 
tude of the bias grows. The increase in the false positive 
rate can be understood by noting that any glucose bias 
induces a shift in /<; that is false with regard to effects 
intrinsic to the tumor. Specifically, under the Michaelis- 
Menten model, a shift in mean glucose between the 
control and treatment groups by fS g units translates into 
an approximate (first-order) -MR™/(^ + (i g ) 2 x S g 
change in the mean level of Ki (see Additional file 3). 
For instance, in the first simulation setting SI, a 30% 
average increase in mean glucose from ^ g = 90 in the 
control to 117 mg/dL in the treatment group induces a 
false, average change in /<"; of -0.0268 per second or ap- 
proximately -11.0%. Substituting for SKi in the analytical 
power equation (see Equation 1 in Additional file 3) 



Table 3 False positive error rates (%) 



Glucose bias 


-30% 


-20% 


-10% 


0% 


10% 


20% 


30% 


Ki 


25.7 


13.1 


6.5 


4.8 


6.5 


12.5 


18.4 




5.0 


4.9 


5.4 


4.9 


4.9 


5.0 


4.5 


MRgluc 


41.8 


18.2 


8.6 


5.1 


6.8 


14.2 


23.6 
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yields an estimated false positive error rate of 19.3%, in 
close agreement with the simulated value of 18.4% (see 
Table 3). The same strong effect on the false positive 
error rate due to a glucose shift was observed for the 
second and the third simulation settings, S2 and S3 
(results not shown). 

The error rates are expressed as percentages for a two- 
sided f-test at level a = 0.05 based on /( ; ,MR^, and 
MR „i uc as a function of glucose bias. Glucose bias is 
defined as the percent change in mean glucose between 
the control and treatment groups. Here, MR g ™ = 
48,ff f = 0.048, n = 12. 

As predicted by the derivations, all three metrics 
(7<i, MR g i uc , and MR^ u a *) correctly accepted the null hy- 
pothesis at baseline in the 42 comparisons of the con- 
trol with treatment groups in the full experimental data 
(Table 2). Also as expected, false-positive results began 
to appear as the data were resampled at smaller sample 
sizes. At sample size n = 8, for example, only one com- 
parison showed high false positive rates by K and 
MRgTuc , at which point MR g i uc gave false positives in 6 
out of the 42 studies (14%). 

Elimination of MR gluc from further consideration 

Because results based on MR g [ uc were highly influenced 
by relatively modest levels of glucose bias (Table 3), 
results that we considered to be false in terms of treat- 
ment response, we judged that the most suitable 



alternative to MR^" was the (uncorrected) K. We 
henceforth simplify the presentation of simulation 
results and analytical derivations by restricting them 
only to K and MR^ u a c x . The performance of MR g i uc in 
the experimental data is, however, shown alongside 
K and MR^* (Additional file 1 and Figure 1). 

Statistical power in theory and in simulation 

As shown in the analytical power derivations presented 
in Additional file 3, an improvement in power for 
MRS^ , P m , relative to the power for K it P k , occurs 
whenever the coefficient of variation (CV) in K evalu- 
ated at the mean glucose level is less than 1. That is, 
with P k , P m the power curves for a test of means of Ki 
and MR^, respectively, then, whenever CV = aJK^^) 

<1, where K t (^) = MR^/ (k m + p g ) , we have 

Pm > Ph Moreover, through manipulation of Equations 1 
and 2 in Additional file 3, we see that the difference 
P m ~ Pk is monotonic, increasing with decreasing CV. Fur- 
ther, the difference P m - P k grows as a 1 increases (holding 
CV constant). We now detail these facts by simulation. 

Figure 3 shows the theoretical power curves P k (blue 
solid line) and P m (black) for the first and second simu- 
lation settings, SI (left panel) and S2 (right panel). The 
first case, SI, represents an average study with para- 
meters MR g ™ and a £ set at the mean levels and with 
n - 10; a potential improvement of approximately 10% 
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Figure 3 Power curves as a function of the treatment effect (6). Simulation settings S1 and S2 are as shown in Figure 2. In SI (left), 
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power curves for MR^? and K„ respectively (see derivations in Additional file 3), while the solid cyan lines show the power improvement. 
The dotted cyan line shows the peak simulated improvement in power for the two settings S1 and S2 at 6 = 0.25 and 6 = 0.18, respectively. 
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occurs at a treatment effect of r5 = 0.25 (cyan solid line), 
with a corresponding simulated improvement of 9.8%. 
The second case, S2, exemplifies a study with a particu- 
larly good signal-to-noise ratio, i.e., low a e . Here, an im- 
provement of approximately 29.2% occurs for 5 = 0.18, 
with a simulated improvement of 29.9%. 

The third simulation case, S3, representing very noisy 
data where MR^ = 38 : ff f = 0.057,« = 10, has a max- 
imum improvement in power of 2.2%, occurring for 
5 = 0.55 (plot not shown). This indicates that with low 
signal-to-noise ratios in the Ki measurement, there is no 
meaningful improvement in power from using MR^ 1 . 
However, cases with high coefficient of variation inevitably 
have low power and require either very large treatment 
effects or very large sample sizes to detect a difference in 
means. Indeed, for case S3, we would require n = 40 for 
80% power to detect a treatment effect of 8 = 0.25. 

For the case n = 8, S = 0.3, the left panel of Figure 4 
shows the power improvement as a function of the coef- 
ficient of variation across the 66 cohorts considered 
(Table 2). The right panel of Figure 4 offers an alterna- 
tive perspective on this power improvement, being the 
sample size required to perform a well-powered study 
(80% chance of correctly rejecting the null hypothesis). 
An average study that requires 10 animals per group 
using Ki is equivalently powered using 8 animals per 
group with MR^ 1 . In addition, the MR^j 1 " measurements 

resist false positive results in the event of glucose bias. 



Congruent with the main result outlined in the deri- 
vations presented in Additional file 3, the improvement 
in power is strongly dependent on the coefficient of 
variation in K, with the largest power improvement 
reaching approximately 25%. Moreover, the greater the 
coefficient of variation for K v the less we can discern 
the effects due to glucose; however, as noted, no test 
performs well with excessively noisy data. 

Statistical power in experimental data 

On the average and in agreement with the simulations, 
MR^ gave greater power than K or MR gluc in 
detecting the known direct on-tumor drug effects in 
all three tumor treatment models studied (Table 1 and 
Figure 1). As expected, all metrics progressively lost 
power as the sample size decreased. For example, in 
Figure 1A at eight mice per group, ME g ™ was able to 
reject the null hypothesis in 93% of the 4,000 combina- 
tions of control vs. treatment groups, while K did so in 
only 52% of the sample combinations. In Figure IB, 
MR g i uc completely misses the treatment effect at all sam- 
ple sizes, but K and MR^* correcdy identified it. Lastly, 
in Figure 1C, looking at six mice per group, we observe 
that MR^* detected a statistically significant difference 
between the groups, 89% of all the sample combinations, 
while MR g i uc did so in only 62% of the cases. However, 
caution must be exercised in drawing fully general 
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conclusions from these limited and somewhat noisy ex- 
perimental data alone. 

Discussion 
Application of MR g i uc 

The original intent behind the multiplication of /<"; by 
[glucose] was to estimate the metabolic rate of glucose 
(MR g i uc ) in tissue under given blood glucose levels based 
on rate constants derived from monitoring a radioactive 
glucose-like tracer in blood and tissue [13,32]. The esti- 
mation implies the assumption that MR g i uc depends on 
substrate concentration, i.e., [glucose] in blood. It fol- 
lows that MR gluc is unsuitable for our particular task of 
quantitatively compensating for changing glucose levels 
when comparing scans collected under different glucose 
conditions. Our results show that even seemingly small 
differences in blood glucose, such as the natural varia- 
tions within a group of similar individuals, are sufficient 
to warrant careful attention to glucose correction when 
making quantitative comparisons. 

The lumped constant 

Measurement of the lumped constant (LC) is not trivial, 
and thus, the (ideal) per-patient or per-lesion values are 
rarely measured and reported with FDG-PET treatment 
studies. Instead, a common constant value of LC is ap- 
plied to all scans. This approach was employed in this 
study too with an assumed LC value of 1, and as previ- 
ously noted [14], the chosen value of LC simply behaves 
as a scaling factor common to every data point and thus 
makes no difference to calculated group statistics such 
as the coefficient of variation, t-test ^-values, or correla- 
tions with blood glucose levels. The statistical results 
presented remain equally valid at all (non-zero) values 
of LC 

Glucose bias and false positive test results 

All three metrics performed correctly in terms of the 
false positive rate in the absence of any systematic glu- 
cose difference between the treatment groups. The fact 
that the i-tests based on Ki and MR g i uc suffer an 
increased false positive error rate under a glucose shift 
(Table 3) renders these tests admissible and useful only 
if one is certain that a treatment can have no systematic 
effect on glucose. Since blood glucose levels may vary, 
we suggest that MR^ 1 makes a more robust and useful 
default metric for FDG-PET data. 

Statistical power in the absence of any glucose bias 

Figure 4 (left hand side) shows the simulated improve- 
ment in power for a modest treatment effect of 20% and 
a sample size of n = 8. As can be seen, the power im- 
provement can be as large as 25% and is highly 



dependent on CV. As predicted by the analytical deriva- 
tions, the benefit of using MR^ is most pronounced at 
low CV. Conversely, for values of CV greater than 35%, 
the power benefit is negligible even though the benefit 
of reduced glucose bias remains. However, for data that 
is very variable (relative to the mean), larger treatment 
effects or sample sizes are always required for adequate 
power, a fact that is detailed in the right hand plot of 
Figure 4. 

Figure 4 (right hand side) shows the required sample 
size for K[ and MR^J 1 ™ as a function of the coefficient of 
variation in order for a study to have 80% power with a 
treatment effect size of 30% (6 = 0.3). As expected, for 
both K[ and MR^j 1 ", the required sample size is an in- 
creasing function of the CV value. We see that a CV of 
22% (the average in our experiments) requires a sample 
size of n = 10 per group for K\ and n = 8 per group for 
M^gTuc • To further describe the results, we can assume a 
fixed sample size and consider what proportion of our 
66 experimental cohorts represented adequately pow- 
ered groups for a treatment study: For the sample size of 
n = 8, we see that 48% were adequately powered using 
MR^, whereas only 26% were adequately powered with 
K\. For a sample size of n = 12 there are more adequately 
powered groups, of course, but still a benefit to using 
MR g ™^: 76% using MRJJ£ and 59% using K h Independ- 
ent of CV, the sample size savings achieved through the 
use of MR^** in this simulation setting is approximately 

two mice; in (relatively rare) situations where a CV as 
low as 10% can be anticipated, we see that studies can 
be adequately powered with only a handful of animals 
per group. 

Understanding this behavior has practical value in 
designing appropriately powered preclinical FDG-PET 
experiments and, perhaps, in permitting a futility ana- 
lysis to be conducted after beginning a study with base- 
line scans and before expending further significant effort 
in drug dosing and repeated scanning. 

Glucose 'normalization' and errors in the measurement of 
blood glucose 

Glucose sampling errors have been postulated as a 
source of variability experienced [9,10] when applying 
the common [glucose] /constant normalization method 
[33] which is analogous to estimating MR g i uc at the 
population mean glucose measurement (the value of the 
constant), typically given as 5 mM or 100 mg/dL. 

We suggest that the problem with this normalization 
scheme lies not with the glucose measurements, but 
with the linear nature of the algorithm. Rather than lin- 
ear scaling to the population mean glucose value, 
MR^ asymptotically follows the Michaelis-Menten ex- 
trapolation to a hypothetical saturating glucose level. 
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Simulations showed that MR^ u a c x results were robust 
even with relatively large 10% errors in the glucose mea- 
surements (full results not shown). This can be intuited 
by noticing that the Km term is on the order of the [glu- 
cose] term, making the glucose measurement error, e s \ c , 
a small part of the total correction factor, K M + [gle] + 
£ g i c . We also note that the algebraic form of this correc- 
tion factor, i.e., [glucose] + constant, appears as a solu- 
tion in analytical derivations that simply start with the 
very general assumption that K is negatively correlated 
with [glucose] over a limited range of glucose values. 
This is presented in Additional file 3 for the interested 
reader. 

Optimal group comparisons with linear regression 

We note that MR^ is optimally estimated by regres- 
sing 7<i on the quantity 1/(A" M + [gle]) under the 
Michaelis-Menten model assumptions specified, with 
the noise process c following the Gaussian distribution 
and with a fixed value for 7<" M . Here, we condition on 
the glucose measurements and set the intercept to 
zero. Given our setup, in the regression framework, 
the t-test of equality of the maximal uptake rates 
MR^' C and MR^' T is a likelihood ratio test and the 
uniformly most powerful unbiased test [34]. Moreover, 
statistically speaking, the regression estimator is best 
linear unbiased under non-Gaussian assumptions [35]. 
We also note that the variance of the regression esti- 
mator and that of the sample average MR^|" are close 
provided that the spread in the term (A" M + [gle]) is 
low relative to its mean. In our setting, since 

fg/ '\Km + ^ g J ~ 0.1, the linear regression and sample 

average solutions are very close to each other, and ei- 
ther may be used when testing for a treatment effect. 
Thus we expect that the familiar and straightforward 
use of sample means (averaging data from multiple 
individuals) will be satisfactory when using MR^? in 
practice, just as it is for /<";. 

Conclusions 

Quantitative comparisons of FDG-PET scans across 
time or between animals are subject to an elevated 
risk of erroneous results when they ignore blood glu- 
cose levels. Multiplying PET data by blood glucose 
levels or 'normalizing' the blood glucose to a common 
reference value (100 mg/dL, for example) offers no 
protection; in fact, it is frequently counterproductive. 
However, by calculating the hypothetical value of the 
maximum glucose uptake rate under saturating glucose 
conditions, MRJ^f, we see reduced problems of glu- 
cose bias and gain increased statistical power to detect 
treatment effects. Based on the average properties 



observed across 66 preclinical cohorts, the power im- 
provement for MR^J 1 * was equivalent to reducing the 
sample size by 20% compared to the next best option, 
which was using the uncorrected K data. 

These benefits were realized in our preclinical studies 
of tyrosine kinase inhibitors by computing MR^f = 
Ki * (Km + [gle] ) using a K M of 130 mg/dL. The analyt- 
ical derivations and simulation methods described in this 
work should facilitate the exploration and assessment of 
our method in other settings. Because it is superior to 
making no glucose correction and its benefits are easily 
obtained and come with no penalty, we highly recom- 
mend the use of (7<" M + [gle]) rather than [glucose] or 
[glucose]/(100 mg/dL) as the glucose correction factor 
in quantitative FDG-PET studies. 
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